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Abstract 

Finite mixture models are statistical models which appear in many problems in statistics and ma¬ 
chine learning. In such models it is assumed that data are drawn from random probability measures, 
called mixture components, which are themselves drawn from a probability measure IP over prob¬ 
ability measures. When estimating mixture models, it is common to make assumptions on the 
mixture components, such as parametric assumptions. In this paper, we make no assumption on the 
mixture components, and instead assume that observations from the mixture model are grouped, 
such that observations in the same group are known to be drawn from the same component. We 
show that any mixture of m probability measures can be uniquely identified provided there are 
2m — 1 observations per group. Moreover we show that, for any m, there exists a mixture of m 
probability measures that cannot be uniquely identified when groups have 2m — 2 observations. 
Our results hold for any sample space with more than one element. 
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1. Introduction 

A finite mixture model is a probability law based on a finite number of probability measures, 
/ri,..., pLm, and a discrete distribution tui,..., Wm- A realization of a mixture model is first gen¬ 
erated by first generating a component at random k, I < k < m, and then drawing from /i^. A 
mixture model can be associated with a probability measure on probability measures, which we 
denote IP. Mixture models are used to model data throughout statistics and machine learning. 

A primary theoretical question concerning mixture models is identifiability. A mixture model 
is said to be identifiable if no other mixture model (of equal or lesser complexity) explains the 
distribution of the data. Some previous work on identifiability considers the situation where the 
observations are drawn iid from the mixture model, and conditions on ,..., prn are imposed, such 
as Gaussianity (Dasgupta and Schulman, 2007; Anderson et al., 2014). In this work we make no 
assumptions on ^ui,..., Instead, we assume the observations are grouped, such that realizations 
from the same group are known to be iid from the same component. We call these groups of 
samples “random groups.” We define a random group to be a random collection X*, where Xj = 

Consider the set of all mixtures of probability measures which yield the same distribution over 
the random groups as does IP. If some element of this set other than IP has no more components 
than IP then IP is not identifiable. In other words, there is no way to differentiate IP from another 
model of equal or lesser complexity. Fortunately, with a sufficient number of samples in each 
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random group, ^ becomes the most simple model which describes the data. In this paper we show 
that, for any sample space, any mixture of probability measures with m components is identifiable 
when there are 2m — 1 samples per random group. Furthermore we show that this bound cannot be 
improved, regardless of sample space. 

1.1. Applications of Probability Measures over Probability Measures 

Though a somewhat mathematically abstract object, probability measures over spaces of probabil¬ 
ity measures arise quite naturally in many statistical problems. Any application which use mixture 
models, for example clustering, is utilizing a probability measure over probability measures. More¬ 
over mixture models are a subset of a larger class of models known as latent variable models. One 
problem in latent variable models which has seen significant interest recently is topic modeling. 
Topic modelling is concerned with the extraction of some sort of topical structure from a collection 
of documents. Many popular methods for topic modelling assume that each document in ques¬ 
tion has a latent variable representing a “topic” or a random convex combination of topics which 
determines the distribution of words in that document (Blei et al., 2003; Anandkumar et al., 2014; 
Arora et al., 2012). 

Another statistical problem which often utilizes a probability measure over probability measures 
is transfer learning. In transfer learning one is interested in utilizing several different but related 
training datasets (perhaps a collection of datasets which correspond to different patients in a study) 
to construct some sort of classifier or regressor for anofher differenf buf relafed fesfing dafasef. There 
are many approaches fo fhis problem buf one formulation assumes fhaf each dafasef is generafed 
from a random probabilify measure and each random measure is generafed from a fixed probabilify 
measure over probabilify measures (Blanchard ef al., 2011; Maurer el al., 2013). 

Finally somefimes we would like fo perform sfafislical fechniques direcfly on a space of proba¬ 
bilify measures. Examples of fhis include defecfion of anomalous disfribufions (Muandef and Scholkopf, 
2013) and disfribufion regression (Poczos ef al., 2013; Szabo ef al., 2014). 

1.2. How Does Group Size Affect Consistency? 

Many of the applications above assume a model similar to the one we described in the first para¬ 
graph. They assume there exists some probability measure, 0^, over a space of probability measures 
from which we have observed groups of data Xi,..., Xat with Xj = Xj i,..., Xi^Ui Vi 
■pi ~ . For example in topic modeling each Xj is a document which contains Mi words and in 

transfer learning Xj is one of the several different training datasets. Proposed algorithms for solving 
these problems often contain some sort of consistency result and these results typically require that 
X —> oo and either Mj —oo for all i or that satisfies some properties which makes Mi oo 
unnecessary. When considering such results one may wonder what sort of statistical penalty we 
incur from fixing Mi = C for all i. 

While this question is clearly interesting from a theoretical perspective it has a couple of impor¬ 
tant practical implications. Firstly it is not uncommon for C to be restricted in practice. An example 
of this is topic modelling of Twitter documents, where the restricted character count keeps each Mj 
quite small. The second important practical consideration is that some latent variable techniques do 
not utilize the full sample Xj and instead break down Xj into many pairs or triplets of samples for 
analysis (Anandkumar et al., 2014; Arora et al., 2012). It is important to know what, if anything, is 
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lost from doing this. Though we do not provide a direct answer to this question, our results seem to 
suggest that such techniques may significantly limit what can be known about . 

2 . Related Work 

The question of how many samples are necessary in each random group to uniquely identify a finite 
mixture of measures has come up sporadically over the past couple of decades. The application of 
KruskaTs theorem (Kruskal, 1977) has been used to concoct various identifiability results for ran¬ 
dom groups containing three samples. In Allman et al. (2009) it was shown that any mixture of lin¬ 
early independent measures over a discrete space or linearly independent probability distributions on 
are identifiable from random groups confaining fhree samples. In Heffmansperger and Thomas 
(2000) if was shown fhaf a mixfure of m probabilify measures on M is identifiable from random 
groups of size 2m — 1 provided fhere exisfs some poinf in M where fhe cdf of each mixfure compo- 
nenf af fhaf poinf is disfincf. The resulf mosf closely resembling our own is in Rabani ef al. (2013). 
In fhaf paper fhey show fhaf a mixfure of m probabilify measures over a discrefe domain is iden¬ 
tifiable wifh 2m — 1 samples in each random group. They also show fhaf fhis bound is fighf and 
provide a consisfenf algorifhm for esfimafing arbifrary mixfures of measures over a discrefe domain. 

Our proofs are quife differenf from ofher relafed idenfifiabilify resulfs and rely on fools from 
functional analysis. Ofher resulfs in fhe same vein as ours rely on algebraic or specfral fheorefic 
fools. Our proofs basically rely on fwo proof fechniques. The firsl technique is fhe embedding of 
finife collections of measures in some Hilberf space. The second fechnique is using fhe properfies 
of symmefric tensors over and applying fhem fo tensor producfs of Hilberf spaces. Our proofs 
are nof fofally defached from fhe algebraic fechniques buf fhe algebraic porfions are hidden away in 
previous resulfs abouf symmefric fensors. 

3. Problem Setup 

We will be freafing fhis problem in as general of a selling as possible. For any measurable space 
we define 6x as fhe Dirac measure al x. For © a sel, cr-algebra, or measure, we denote fo 
be fhe slandard a-fold producl associated wifh fhaf objecl. For any nalural number k we define 
[k] = N Pi [1, A;]. Lei 0 be a sel confaining more lhan one elemenl. This sel is fhe sample space of 
our dala. Lef be a cr-algebra over O. Assume / {0, 0}. We denote fhe space of probabilify 
measures over fhis space as V (0, T), which we will shorfen fo V. We will equip T> wifh fhe cr- 
algebra 2® so fhaf each Dirac measure over T> is unique. Define A (D) = span (d^ : x G V). This 
will be fhe ambienl space where our mixfures of probabilify measures live. Lef ^ 

be a probabilify measure in A (D). Lef ^ r-j ^ and Ai,..., A„ We will denote X = 

(Ai,...,A„). 

We will now derive fhe probabilify law of X. Lef A G we have 

m 

P(XgA) = ^P(Xg A|^ = ^,)P(f = F*) 

i=l 

m 

= (A). 

i=l 
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The second equality follows from Lemma 3.10 in Kallenberg (2002). So the probability law of X is 

m 

( 1 ) 

i=l 

We want to view the probability law of X as a function of ,^3^ in a mathematically rigorous way, 
which requires a bit of technical buildup. Let V be a vector space. We will now construct a version 
of the integral for V-valued functions over D. Let ^ € A {T>). From the definition of A (22) it 
follows that ^ admits the representation 


r 

^ ^ ^ ^pui O^i • 

i=l 

From the well-ordering principle there must exist some representation with minimal r and we define 
r as fhe order of We can show fhaf fhe represenfafion of any ^ G A (V) is unique up fo 
permufafion of ifs indices. 

Definition 1 VFc call ^ a mixture of measures if it is a probability measure in A {T>). We will say 
that tiP has m mixture components if it has order m. 

Lemma 2 Let ^ G A (P) and admit minimal representations ^ — Y7i=i ^p'L^i- 

There exists some permutation ■0 : [r] —)■ [r] such that and = ot[for all i. 

Proof Because both representations are minimal it follows that a'^ 0 for all i and p,[ pi for all 

i 7 ^ j. From this we know {{pi}) 0 for all i. Because ^ {{p'j}) 0 for all i it follows that for 

any i there exists some j such that pi = pj. Let : [r] —> [r] be a function satisfying pi = p^(iy 
Because the elements pi,..., p^ also distinct ■0 must be injective and thus a permutation. Again 
from this distinctness we get that, for all i, ^ {{p'j}) = al = and we are done. ■ 

Henceforth when we define an elemenf of A (P) wifh a summafion we will assume fhaf fhe sum¬ 
mation is a minimal represenfafion. Any minimal represenfafion of a mixfure of measures tiP wifh 
m componenfs safisfies ^ wifh Wi > 0 for all i and — 1- mixfure 

of measures is a convex combination of Dirac measures af elemenfs in P. 

For a funcfion / : P —> V define 

/ f{b)d^{b) = X] ’ 

d i=i 

where dpiOci is a minimal represenfafion of cS. This infegral is well defined as a consequence 
of Lemma 2. 

For a cr-algebra (Q,S) we define Ad (Q,S) as fhe space of all finife signed measures over 
fhaf space. Lef Xn : Ad (D, P) —)■ Ad ] p i-)- p^^. We infroduce fhe operator Vn : 

A(P) ^ Ad 


Vn{^) = I Xn{b)d^ ( a ) = I b^^d^ip). 
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For a minimal representation ^ we have 


r 




2 = 1 


From this definition we have that Vn (^) is simply the law of X which we derived earlier. Two 
mixtures of measures are different if they admit a different measure over T). 

Definition 3 We call a mixture of measures, n-identifiable if there does not exist a different 
mixture of measures , with order no greater than the order of such that = Vn 

Definition 3 is the central object of interest in this paper. Given a mixture of measures, = 
then Vn{if^) is equal to the measure from which X is drawn. In topic 

modelling X would be the samples from a single document and in transfer learning it would be one 
of the several collections of training samples. If is not n-identifiable then we know that there 
exists a mixture of measures which is no more complex (in terms of number of mixture components) 
than which is not discernible from given the data. Practically speaking this means we need 
more samples in each random group X in order for the full richness of to be manifested in X. 

4. Results 

Our primary result gives us a bound on the n-identifiability of all mixtures of measures with m or 
fewer components. We also show that this bound is tight. 

Theorem 4 Let be a measurable space. Mixtures of measures with m components are 

[2m — l)-identifiable. 

Theorem 5 Let (O, J-) be a measurable space with J- {0,0}. For all m, there exists a mixture 
of measures with m components which is not [2m — 2)-identifiable. 

Unsurprisingly, if a mixture of measures is re-identifiable then it is q-identifiable for all q > n. 
Likewise if a mixture of measures is not re-identifiable then it is not g-identifiable for q < n. Thus 
identifiability is, in some sense, monotonic. 

Lemma 6 If a mixture of measures is n-identifiable then it is q-identif able for all q > n. 

Proof We will proceed by contradiction. Let be re-identifiable, let = 

Yl^j=i be a different mixture of measures with r < I and 


r 



for some q > n. Let A € be arbitrary. We have 


r 
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I r 

i=l j=l 

I r 


i=i 


i=l 


This implies that is not n-identifiable, a contradiction. 


Lemma 7 If a mixture of measures is not n-identifiable then it is not q-identifiable for any q < n. 

Proof Let a mixture of measures aid^ not be n-identifiable. It follows that there exists 

a different mixture of measures bjdu., with r < I, such that 

= E 

i=l j=l 


Let A £ he. arbitrary, we have 


X {A X 


i=l 


J=1 


i=i 


i=l 


and therefore 0^ is not ^-identifiable. ■ 

Viewed alfernafively fhese resulfs say fhaf n = 2m — 1 is fhe smallesf value for which Vn is injective 
over fhe sef of all minimal mixfures of measures wifh m or fewer componenfs. 


5. Tensor Products of Hilbert Spaces 

Our proofs will rely heavily on fhe geomefry of tensor producfs of Hilberf spaces which we will 
infroduce in fhis secfion. 


5.1. Overview of Tensor Products 

Firsf we infroduce fensor producfs of Hilberf spaces. To our knowledge fhere does nol exisf a rigor¬ 
ous consfrucfion of fhe fensor producf Hilberf space which is bofh succincf and infuifive. Because 
of fhis we will simply sfafe some basic facls abouf fensor producfs of Hilberf spaces and hopefully 
insfill some infuifion for fhe uninitiated by way of example. A fhrough frealmenf of fensor producfs 
of Hilberf spaces can be found in Kadison and Ringrose (1983). 

Lef H and H' be Hilberf spaces. From fhese fwo Hilberf spaces fhe “simple fensors” are ele- 
menfs of fhe form h ® h' wifh h ^ H and h' E H' . We can freaf fhe simple tensors as being fhe 
basis for some inner producf space Hq, wifh fhe inner producf of simple fensors satisfying 


{hi ® h[,h 2 ® (12) = {hi, (12) (/I'l, /12) • 
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The tensor product of H and H' is the completion of Hq and is denoted H (g) H'. To avoid potential 
confusion we note that notation just described is standard in operator theory literature. In some 
literature our definition of Hq is denoted as H ^ H' and our definition of H ^ H' is denoted 

As an illustrative example we consider the tensor product L? (M) (g) (M). It can be shown that 

there exists an isomorphism between (M) (g) (M) and L^(]R^) which maps the simple tensors 

to separable functions, f ® f We can demonstrate this isomorphism with a simple 

example. Let /, g, f ,g' € (M). Taking the L^(M^) inner product of f{-)f'{-) and g{-)g'{-) gives 

us 


j j {f{x)f'{y)) {g{x)g'{y))dxdy = j f{x)g{x)dx j f'{y)g'{y)dy 

= {f,g){f',9') 

= {f ® f\g®g') ■ 

Beyond tensor product we will need to define fensor power. To begin we will firsl show fhaf 
tensor producfs are, in some sense, associative. Lef Hi, H 2 , H 3 be Hilbert spaces. Proposition 2.6.5 
in Kadison and Ringrose (1983) slates lhal Ihere is a unique unilary operator, U : {Hi^H 2 )^H^ 

Hi (g) {H 2 'Si H^), which satisfies Ihe following for all hi € Hi, /i 2 G H 2 , G H^, 

U {{hi S / 12 ) S /is) = hiS (/i2 S /is) • 

This implies fhaf for any collecfion of Hilberf spaces. Hi,..., Hn, the Hilbert space HiS - ■ - S H^ 
is defined unambiguously regardless of how we decide to associale Ihe producfs. In Ihe space 
Hi S ■ ■ ■ S Hn we define a simple fensor as a veclor of Ihe form /ii (g) ■ • ■ (g) /i„ wilh hi G Hi. 
In Kadison and Ringrose (1983) if is shown lhal Hi S ■ ■ ■ S Hn is fhe closure of Ihe span of Ihese 
simple tensors. To conclude fhis primer on tensor producfs we infroduce Ihe following nolalion. For 
a Hilberf space H we denofe H®^ = H S H S ■ ■ ■ S H and for h ^ H, /i®"’ = hShS ■ ■ ■ Sh. 

'-V-" '-v-' 

n times n times 

5 . 2 . Some Results for Tensor Product Spaces 

We will derive state technical results which will be useful for the rest of the paper. These lemmas 
are similar to or are straightforward extensions of previous results which we needed to modify for 
our particular purposes. Let (iF, Q, g) be a fj-finite measure space. We have the following lemma 
which connects the space of products of measures to the tensor products of the space for each 
measure. The proof of this lemma is straightforward but technical and can be found in the appendix. 

Lemma 8 There exists a unitary transform U : (^i, Q, —>• /i^”) such that, 

for all /i,... ,/n G {^,g,g), JJ {fi S ■ ■ ■ S fn) = /i(-) • ■ ■ fn{-)- 

The following lemma used in the proof of Lemma 8 as well as the proof of Theorem 5. The proof 
of this lemma is also not particularly interesting and can be found in the appendix. 

Lemma 9 Let Hi,..., Hn, H[,..., H'^ be a collection of Hilbert spaces and Ui,... ,Un a col¬ 
lection of unitary operators with Ui : Hi ^ H'- for all i. There exists a unitary operator U : 
HiS ■ ■ ■ S Hn ^ H[S ■ ■ ■ S H'n satisfying U {hi S ■■■ S hn) = Ui{hi) S ■■■ S Un{hn)for all 
hi G Hi,..., hn G Hn- 
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Lemma 10 Let n > 1 and let hi,... ,hn be elements of a Hilbert space such that no elements are 
zero and no pairs of elements are collinear. Then ..., are linearly independent. 

A statement of this lemma for can be found in Comon et al. (2008). We present our own proof 
for the Hilbert space setting. 

Proof We will proceed by induction. For n = 2 the lemma clearly holds. Suppose the lemma holds 
forn — 1 and l&i hi,... ,hn satisfy the assumptions in the lemma statement. Let ai,... ,an satisfy 

n 

= ( 2 ) 

i=l 

To finish the proof we will show that ai must be zero which can be generalized to any ai without 
loss of generality. Let Hi and H 2 be Hilbert spaces and let {Hi,H 2 ) be the space of Hilbert- 
Schmidt operators from Hi to H 2 . Hilbert-Schmidt operators are a closed subspace of bounded 
linear operators. Proposition 2.6.9 in Kadison and Ringrose (1983) states that for a pair of Hilbert 
spaces Hi, H 2 there exists an unitary operator U ■. Hi ® H 2 ^ .^5^’ {Hi, H 2 ) such that U{gi ® 
92 ) = gi { 92 ,-)■ Applying this operator to (2) we get 

n 

Y,hT-^hi,-)ai = {). (3) 

i=l 

Because hi and hn are linearly independent we can choose 2 : such that {hi, 2 ) 7 ^ 0 and 2 ; _L hn- 
Plugging 2 ; into (3) yields 


n—1 

{hi,z) ai = 0 

i=l 

and therefore ai = 0 by the inductive hypothesis. ■ 


6. Proofs of Theorems 

With the tools developed in the previous sections we can now prove our theorems. First we intro¬ 
duce one additional piece of notation. For a function p on a domain X we define p^^ as simply 
fhe producf of fhe funcfion k times on the domain p{-) ■ ■ ■ p{-). For a measure the notation 

'-V-' 

k times 

continues to denote the standard product measure. 

Finally will need the following technical lemma to connect the product of Radon-Nikodym 
derivatives to product measures. The proof is straightforward and can be found in the appendix. 

Lemma 11 Let (T', Q) be a measurable space, rj and 7 a pair of bounded measures on that space, 
and f a nonnegative function in L^ (7) such that, for all A ^ Q, p (A) = fdy. Then for all n, 
for all B G we have 
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Proof of Theorem 4 We will proceed by contradiction. Suppose there exist two different mixtures 
of measures = ^21=1 ^ = YljLi such that 


I m 

i=l j=l 

and I < m. From our assumption on representation we know /r* ^ for all i ^ j and similarly 
for ui,... ,i/m- We will also assume that 7 ^ Vj for all i,j. Were this not true we could simply 
subtract the smaller of the common terms from both sides of (4) and normalize to yield another pair 
of distinct mixtures of measures with fewer components and no shared terms, and J2'. Let ^ 
have m' components and have V with m' > I'. Ifm ^ m' then we can apply Lemma 7 to give 
us V 2 m'-i (=S) = V 2 m'-i {^') und proceed as usual. 

Let ^ = Yl\=i Fi + ^j- Clearly ^ dominates /r* and Uj for all i,j so we can define 

Radon-Nikodym derivatives pi = qj = ^ which are in (fl, fF, ^). We can assert that these 
derivatives are everywhere nonnegative without issue. Clearly no two of these derivatives are equal. 
If one of the derivatives were a scalar multiple of another, for example pi = ap 2 for some a 7 ^ 1, it 
would imply 


Pi (Q) = / Pidi = / ap 2 di = a. 
Jn J 

This is not true so no pair of these derivatives are collinear. 
Lemma 11 tells us that, for any R € JTx 2 m-i 


Therefore 




’Ri=i 


I 

E x2m—1 / D\ 
aiPi [R) 

i=l 

m 

3=^ 


P m 


x2m—1 ^/^x2m—1 




E 

2 = 1 


aiPi 


.,x 2 m—1 


Y.bitf"'-' 

f=l 


(4) 


2m- 1 .almost everywhere (Proposition 2.23 in Folland (1999)). We will now show for all i,j that 
Pi G L3 (12,7^, and qj € (f2, ^). We will argue this for pi which will clearly generalize to 

the other elements. First we will show that pi < 1 ^-almost everywhere. Suppose this were not true 
and that there exists A £ T with ^ (^) > 0 and pi (A) > 1. Now we would have 

~ „ l m 

Pi (A) = / pid^ > / ld^ = ^{A) = J2 Pi (^) + ^ (^) ^ Ft (A) 

JA Ja 
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a contradiction. Evaluating directly we get 

j pi{ujfd^{uj) < j ld^{uj) 

= 

= l + m, 

so Pi G (0, ^). Applying the U~^ operator from Lemma 8 to (4) yields 

I m 

i=l j=l 

Since I + m < 2m Lemma 10 states that ..., are all lin¬ 
early independent and thus Oj = 0 and bj = 0 for all i, j, a contradiction. ■ 

Proof of Theorem 5 To prove this theorem we will construct a pair of different mixture of measures, 
^ ^ which both contain m components and satisfy 1 ^ 2 m -2 {^) = ^ 2 m -2 (^0- 

Lrom our definition of (ff, we know there exists F € such that F, F^ are nonempty. Let 
f ^ F and /' € F^. It follows that 6 f ^ Sp are different probability measures on (If, J^). Because 
5f and 5fi are dominated by ^ = (5y + <5/' we know that there exists a pair of measurable functions 
p,p' such that, for all A, 6f (A) = j^pd^ and 5fi (A) = J^p'd^. We can assert that p and p' are 
nonnegative without issue. 

Lrom the same argument we used in the proof of Theorem 4 we know p, p' G (fl, ^). Let 

H 2 be the Hilbert space generated from the span of p,p'. Let be 2m distinct elements of 

[0,1] and let be elements of F, with pt = Sip + (1 — £i)p'- Clearly pt is a pdf 

over ^ for all i and there are no pairs in this collection which are collinear. Let H 2 be the Hilbert 
space generated from the span of p and p'. Since H 2 is isomorphic to there exists a unitary 
operator U ■. H 2 ^ Lrom Lemma 9 there exists a unitary operator U 2 m -2 '■ —> 

j^ 2 ® 2 m 2 U 2 m -2 {hi ® ® /i 2 m- 2 ) = U{hi) (g) ■ ■ • (g) C/(/i 2 m- 2 )- Because U is Unitary 

the set U 2 m -2 (span ^|/j® 2 m -2 . ^ ^ ^^ 2 })) maps exactly to the set span {x®‘^'^~‘^ : x G M^). An 
order r tensor, is symmetric if A. 0 (j^) = Ai^^...^j^for any ii,... ,ir and permutation 

'll;. A consequence of Lemma 4.2 in Comon et al. (2008) is that span : x G M^}) C 

S 2 ™- 2 (C^) is exactly the space of all symmetric order 2 m — 2 tensors over C^. 

Lrom Proposition 3.4 in Comon et al. (2008) it follows that the dimension of 52 m -2 

= 2m — 1. Lrom this we get that dim (span ^/j® 2 m -2 ■ h £ < 

2m — 1. 

The bound on the dimension of span ^|/j® 2 m -2 ; /j g //g}) implies that are lin¬ 

early dependent. Conversely Lemma 10 implies that removing a single vector from 
yields a set of vectors which are linearly independent. It follows that there exists (a*)^^^ with 
a* / 0 for all i and 


/ 2 +2m-2-1 
Y 2m — 2 


2 = 0. 
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Without loss of generality we will assume that Oj < 0 for i G [A;] with k < m. From this we 
have 


From Lemma 8 we have 


and thus 


E 

i=l 


2m 


-aip. 


®2m—2 _ 




<^ 2771—2 


j=k+l 


k 2m 

i=l j=k-\-l 


/ k « 2m 

i=l d ,=fc_|_i 


x2m—2 


E 


j=k+l 
2m 


-a,; = J]; , 

i=l j=k-\-l 


Let r = Y!1 =i -ai. We know r > 0 so dividing both sides of (5) by r gives us 


(5) 


k 2m 

_^ 02m—2 \ ^ ^3 0)2m—2 

^ ^ ^3 

i=l j=k-\-l 

and the left and the right side are convex combinations. Let positive numbers with /3j = 

for i G {1,..., A:} and ^ for j G {A: + 1,..., 2m}. This gives us 


It follows that 


J2p^pfrn-2 

i=l 


2m 


j=k+l 


k 2m 

®pf^-\ 

i=l j=k-^l 

We will now show that k = m. Suppose k < m. Then pf ^, • • •, pf33 ^ linearly independent. 
From this we know that there exists z such that z ± pf^~^ for i G [A;] but z is not orthogonal to 
pf33~^- Using this vector we have 




'Z^PiP' 


®2m—1 


iZ = 




t i=l 


2 = 1 
0 
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but 


2m 


2m 




\i=k+l 

and thus k = m. 
Now we have 


i=k-\-l 
> 0 


i= 

Applying Lemma 8 we get that 

n 

From Lemma 11 we have, 

m 

^/3i(ei<5/ + (l-ei)(5/^) 

i=l 

Setting /Ji = [siSf + (1 — £i) 5/') yields 

m 


m 2m 

^(g) 2 m —2 E 

2=1 ^ = 771+1 


m 2m 

x2m—2 _ ^ ^ p _^y.2m—2 

2=1 j=m+l 


2m 


x2m—2 


Y 

j =772+1 


x2m—2 


m 2m 

x2m—2 _ \ ^ o x2m—2 

i - Pik-j 

2=1 ^= 222+1 


Thus setting ^ = YT=l A+ ^^d = Ej=m+1 gives us V 2 m -2 {^) = i^ 2 m -2 

13^ ^ 3^' by construction. 


') and 


6.1. Discussion of the Proof of Theorem 5 

In the previous proof we could have replaced Sf,6fi with any distinct pair of probability measures 
on (n, J^). Thus the pair are not pathological because of some property of each individual 

mixture component, but because of geometry of the mixture components considered as a whole. The 
measures /ri,..., \i 2 n are a convex combinations of Jy and 6 f / and therefore lie in a one dimensional 
affine subspace of A (V). The space of Bernoulli measures similarly lie in a subspace between two 
measures, the point mass at 0 and the point mass at 1. Given a mixture of Bernoulli distributions, 
the sum of iid samples of Bernoulli random variables is a binomial distribution. We can draw a 
connection between our result and the identifiability of mixtures of binomial distributions. 

Consider as mixture of m Bernoulli distributions with parameters Ai,..., Am and weights 
wi,... Wm- Suppose we have n samples in each random group. If we let Yi be the sum of the random 
group Xj then the probability law of Yi is a mixture of binomial random variables. Let p(A, n) be 
the distribution of a Bernoulli random variable with parameters n and A. Specifically we have fhaf 
fhe disfribufion of Y^ = i rUjp(Aj, n). In Blischke (1964) if was shown fhaf n > 2m — 1 
is a necessary and sufficienl condifion for fhe idenlifiabilily of fhe paramefers Ai,..., Am from 
fhe samples Y). We find fhese similarifies provoking buf are nof prepared fo make more precise 
connecfions af fhis fime. 
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7. Conclusion 

In this paper we have proven a fundamental bound on the identifiability of mixture models in a 
nonparametric setting. Any mixture with m components is identifiable with groups of samples 
containing 2m — 1 samples from the same latent probability measure. We show that this bound is 
tight by constructing a mixture of m probability measures which is not identifiable with groups of 
samples containing 2m — 2. These results hold for any mixture over any domain with at least two 
elements. 
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Appendix A. Additional Proofs 

Proof of Lemma 8 Example 2.6.11 in Kadison and Ringrose (1983) states that for any two a- 
finite measure spaces (S, =5^, m) , {S', 5^', m') there exists a unitary operator U : {S, 5^, m) 0 

{S', , m') LP' {S X S', x , m x m') such that, for all /, g, 

U{f®g) = f{-)g{-). 

Because ('h, Q, rj) is a cj-finite measure space it follows that is a fi-finite mea¬ 

sure space for all m G N. We will now proceed by induction. Clearly the lemma holds for 
n = 1. Suppose the lemma holds for n — 1. From the induction hypothesis we know that 
there exists a unitary transform Un-i ■ {'^, G, ^ 

for all simple tensors/i 0 • • • 0 fn-i /i(')''' fn-i {■)■ Combining Un-i with the identity 
map via Lemma 9 we can construct a unitary operator ('h, G, 0 ('h, G, g) 

L2 ^^xn-l^gxn-l^j^n-1^^^2 r/), which maps/i0- ■ ■0/„_l0/n /l(-) • fn-l{-)®fn 

From the aforementioned example there exists a unitary transform : L? 

G, g) —>• X X G, g'^~^ X g') which maps f 0 f' ^ f {■) f (•)■ Defin¬ 
ing Un{-) = Kn {Tn {■)) yields our desired unitary transform. ■ 


Proof of Lemma 9 Proposition 2.6.12 in Kadison and Ringrose (1983) states that there exists a 
continuous linear operator U : ifi 0 • • • 0 Hn —0 • • • 0 such that U {hi ^ hn) = 
Ui{hi) 0 • • • 0 Un{hn) for all hi € Hi, ■ ■ ■ ,hn G Hn- Let H be the set of simple tensors in 
iLi0- • -^Hn and H' be the set of simple tensors in Lf{0- • Because Ui is surjective for all i, 

clearly U{H) = H'. The linearity of U implies that C/(span(f/)) = span(i/'). Because span(f/') 
is dense in 0 • • • 0 the continuity of U implies that U{Hi 0 • • • 0 Hn) = 0 • • • 0 H'n so U 

is surjective. All that remains to be shown is that U preserves the inner product. By the continuity 
of inner product we need only show that {h, g) = (u{h), U{g)^ for h,g ^ span(if). With this in 
mind let /ii,..., /ijv, gi, ■ ■ ■, 9 m £ H. We have the following 


U 



,U 



IN M 

[Y^U{hi) {gj 

\i=l j=l 

N M 

i=i j=i 
N M 

i=i j=i 

IN M \ 

\i=i j=i / 


We have now shown that U is unitary which completes our proof. 


Proof of Lemma 11 The fact that / is positive and integrable implies that the map 
is a bounded measure on G'^") (see Folland (1999) Exercise 2.12). 
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Let R = Ri X ... X Rn be a rectangle in Let I 5 be the indicator function for a set S. 
Integrating over R and using Tonelli’s theorem we get 


IR 


fxnd^xn ^ 




/■"/^n/(a^i)j rf7(a;i)---d7(a 

J ’ J d-fixi) ■ ■ ■ d'yixn) 

n f / ^R^ixi)fix^)d'y{xi) 


2=1 

n 


= 

2=1 

= v"^{R)- 


Any product probability measure is uniquely determined by its measure over the rectangles (this 
is a consequence of Lemma 1.17 in Kallenberg (2002) and the definition of product cj-algebra) 
therefore, for all B G 


{B) = j . 

JB 


/x"d7> 
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